Follow-up Models: Gaze-only and Hand-only
Data Quality & Coverage Gate
This section prevents misleading models when cells are missing. Current data: N=81 participants.
Participant counts. Note: 'df' refers to correct trials, RT ∈ [150, 6000] ms, non-practice.
| Total participants (raw) |
81 |
| Participants with any valid trials |
81 |
| Participants in df (correct, RT-filtered) |
81 |
Condition coverage (modality × ui_mode × pressure)
| hand |
static |
0 |
1998 |
73 |
FALSE |
OK |
| hand |
static |
1 |
2038 |
75 |
FALSE |
OK |
| hand |
adaptive |
0 |
2025 |
74 |
FALSE |
OK |
| hand |
adaptive |
1 |
2052 |
75 |
FALSE |
OK |
| gaze |
static |
0 |
2125 |
78 |
FALSE |
OK |
| gaze |
static |
1 |
2158 |
79 |
FALSE |
OK |
| gaze |
adaptive |
0 |
2182 |
80 |
FALSE |
OK |
| gaze |
adaptive |
1 |
2133 |
78 |
FALSE |
OK |
All factors have ≥2 levels in the data.
Blocks logged per participant
| P001 |
8 |
| P002 |
7 |
| P003 |
8 |
| P004 |
8 |
| P005 |
8 |
| P006 |
8 |
| P007 |
8 |
| P008 |
8 |
| P009 |
8 |
| P010 |
8 |
| P011 |
8 |
| P012 |
8 |
| P013 |
8 |
| P014 |
8 |
| P015 |
8 |
| P016 |
8 |
| P017 |
8 |
| P018 |
8 |
| P019 |
8 |
| P020 |
8 |
| P021 |
8 |
| P022 |
8 |
| P023 |
8 |
| P024 |
8 |
| P025 |
8 |
| P026 |
8 |
| P027 |
8 |
| P028 |
8 |
| P029 |
8 |
| P030 |
4 |
| P031 |
8 |
| P032 |
8 |
| P035 |
8 |
| P036 |
8 |
| P037 |
4 |
| P038 |
8 |
| P039 |
8 |
| P040 |
8 |
| P041 |
8 |
| P042 |
8 |
| P043 |
4 |
| P045 |
8 |
| P046 |
8 |
| P047 |
8 |
| P048 |
7 |
| P049 |
8 |
| P050 |
8 |
| P051 |
8 |
| P054 |
8 |
| P055 |
8 |
| P057 |
6 |
| P058 |
8 |
| P059 |
4 |
| P060 |
8 |
| P061 |
4 |
| P062 |
8 |
| P063 |
4 |
| P064 |
8 |
| P065 |
8 |
| P066 |
8 |
| P067 |
8 |
| P068 |
8 |
| P069 |
8 |
| P070 |
8 |
| P072 |
4 |
| P073 |
5 |
| P074 |
8 |
| P075 |
8 |
| P076 |
8 |
| P077 |
8 |
| P078 |
7 |
| P079 |
8 |
| P080 |
8 |
| P081 |
8 |
| P082 |
8 |
| P083 |
8 |
| P084 |
8 |
| P085 |
8 |
| P086 |
8 |
| P087 |
8 |
| P088 |
8 |
1. Executive Summary
This report analyzes 81 participants performing Fitts’ law pointing tasks across two input modalities (Hand, Gaze) and two UI modes (Static, Adaptive).
Results Snapshot (N = 81)
### Gaze: Adaptive vs Static (Primary Adaptive Test)
Gaze contrasts: adaptive - static (by pressure)
| gaze |
0 |
-0.0237599 |
0.0356422 |
-0.0005056 |
| gaze |
1 |
-0.1245809 |
0.0529007 |
-0.0156147 |
### Hand: Pressure Effect (UI Mode Not Exercised)
Hand contrasts: pressure ON - pressure OFF
| hand |
-0.0737157 |
0.0202401 |
0.000645 |
*Note:* Hand width inflation did not activate (width_scale_factor always 1); UI mode is not interpreted as an adaptive manipulation for hand. The gaze-only UI Mode × Pressure model is the primary test of adaptive vs static effects.
RQ2 snapshot: Overall TLX
| gaze |
adaptive |
46.8 |
| gaze |
static |
45.8 |
| hand |
adaptive |
41.1 |
| hand |
static |
40.8 |
RQ3 manipulation check: width scaling
| gaze |
adaptive |
1 |
0 |
| gaze |
static |
1 |
0 |
| hand |
adaptive |
1 |
0 |
| hand |
static |
1 |
0 |
*Note:* Hand width inflation did not activate; all recorded `width_scale_factor` values equal 1.0. See root-cause diagnostic section for investigation of non-activation.
Key Findings
- Total Trials Analyzed: 14953 valid trials (correct responses, RT 150-6000ms)
- Total Trials Collected: 17442
- Overall Error Rate: 14%
- Mean Throughput: 3.36 bits/s (SD = 1.04)
- Mean Movement Time: 1.147s (SD = 0.469s)
2. Demographics
Sample Size: N = 81 participants.
Overall Demographics
| 81 |
30.1 |
8 |
18 - 62 |
1.5 |
4.3 |
By Gender
| female |
36 |
30.3 |
8.7 |
0.1 |
| male |
45 |
30.0 |
7.5 |
2.6 |
Gaming Status
Participants were primarily non-gamers (median self-reported gaming = 0 hours/week; only 11.1 % reported ≥5 hrs/week).
3. Primary Analysis: Throughput
Research Question: Does the Adaptive UI improve performance (Throughput) compared to Static, especially for Gaze?
Sample Size: N = 75 participants for hand modality (mouse users only), N = 81 participants for gaze modality (mouse + trackpad users). See Data Quality section for input device exclusion rationale.
Analysis Note: We observe a large main effect of modality (hand > gaze) on throughput. Interaction effects are treated as exploratory.
Summary Statistics
Throughput (bits/s) by Condition (N = 81 participants)
| hand |
static |
0 |
73 |
219 |
3.56 |
0.92 |
3.48 |
2.94 |
4.06 |
| hand |
static |
1 |
75 |
224 |
3.53 |
0.98 |
3.49 |
2.92 |
4.16 |
| hand |
adaptive |
0 |
74 |
222 |
3.59 |
0.96 |
3.61 |
2.86 |
4.25 |
| hand |
adaptive |
1 |
75 |
225 |
3.47 |
0.95 |
3.49 |
2.68 |
4.18 |
| gaze |
static |
0 |
77 |
227 |
3.23 |
1.06 |
3.10 |
2.49 |
3.80 |
| gaze |
static |
1 |
78 |
231 |
3.22 |
1.08 |
3.08 |
2.51 |
3.74 |
| gaze |
adaptive |
0 |
80 |
233 |
3.18 |
1.09 |
3.01 |
2.43 |
3.85 |
| gaze |
adaptive |
1 |
78 |
226 |
3.10 |
1.11 |
2.86 |
2.34 |
3.61 |
Statistical Model Results
Planned Sample Size & Power
The throughput analysis was designed for a within-subjects 2×2×2 factorial (modality × UI mode × pressure). However, the HAND adaptive manipulation (width inflation) did not execute (width_scale_factor always 1), so UI mode is not interpretable as an adaptation for hand. The primary adaptive test is the gaze-only UI Mode × Pressure interaction, which evaluates whether declutter (the gaze adaptive manipulation that did execute) improves performance. Standard repeated-measures power calculations and guidelines (Cohen, 1988; Brysbaert, 2019) indicate that N ≈ 50 participants is sufficient for 80% power to detect dz ≈ 0.40. We therefore set N = 48 (six complete Williams sequences) as the primary design target, with the option to extend to N = 64 (eight sequences) if recruitment permits. Given the large number of trials per condition and the mixed-effects model (random intercepts per participant), this sample size is expected to provide high power for modality main effects and gaze-only adaptive effects, while the omnibus UI mode effect is diluted by hand non-manipulation and should be interpreted via the targeted gaze-only follow-up.
### Model: TP ~ modality * ui_mode * pressure + (1 | pid)
**Model Structure:** Random intercept only (participant-level random effects). The trial-level N (≈ 14953 ) should not be mistaken for independent units in discussions of power; the effective N for inference is the number of participants ( 81 ).
**Data Summary:** 81 participants, 14953 trials, 8 conditions, minimum 1715 trials per condition.
#### ANOVA Table
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
modality 45.690 45.690 1 1743.3 69.7142 <2e-16 ***
ui_mode 1.199 1.199 1 1726.8 1.8295 0.1764
pressure 1.699 1.699 1 1727.5 2.5920 0.1076
modality:ui_mode 0.537 0.537 1 1727.0 0.8193 0.3655
modality:pressure 0.025 0.025 1 1727.3 0.0388 0.8439
ui_mode:pressure 0.788 0.788 1 1728.5 1.2028 0.2729
modality:ui_mode:pressure 0.001 0.001 1 1727.4 0.0010 0.9753
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
**Analysis Note:** At N= 81 , 3-way interactions may be underpowered. Non-significant interaction effects should be treated as exploratory.
#### Model Summary
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula: formula_tp$formula
Data: df_tp_model
Control: lmerControl(optimizer = "bobyqa")
AIC BIC logLik -2*log(L) df.resid
4597.4 4652.4 -2288.7 4577.4 1797
Scaled residuals:
Min 1Q Median 3Q Max
-2.5970 -0.7147 -0.0052 0.6421 4.4587
Random effects:
Groups Name Variance Std.Dev.
pid (Intercept) 0.3832 0.6190
Residual 0.6554 0.8096
Number of obs: 1807, groups: pid, 81
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.344e+00 7.147e-02 8.119e+01 46.798 <2e-16
modality1 1.622e-01 1.943e-02 1.743e+03 8.350 <2e-16
ui_mode1 2.579e-02 1.907e-02 1.727e+03 1.353 0.176
pressure1 3.072e-02 1.908e-02 1.727e+03 1.610 0.108
modality1:ui_mode1 -1.726e-02 1.907e-02 1.727e+03 -0.905 0.366
modality1:pressure1 3.757e-03 1.907e-02 1.727e+03 0.197 0.844
ui_mode1:pressure1 -2.095e-02 1.911e-02 1.729e+03 -1.097 0.273
modality1:ui_mode1:pressure1 -5.906e-04 1.908e-02 1.727e+03 -0.031 0.975
(Intercept) ***
modality1 ***
ui_mode1
pressure1
modality1:ui_mode1
modality1:pressure1
ui_mode1:pressure1
modality1:ui_mode1:pressure1
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) mdlty1 ui_md1 prssr1 md1:_1 mdl1:1 u_m1:1
modality1 0.011
ui_mode1 0.001 0.000
pressure1 0.002 0.006 0.007
mdlty1:_md1 0.001 0.003 0.014 -0.004
mdlty1:prs1 0.003 0.002 -0.005 0.016 0.007
u_md1:prss1 0.003 -0.005 0.003 0.006 0.007 0.004
mdlty1:_1:1 -0.002 0.007 0.006 0.000 0.003 0.001 0.011
#### Written Results (APA Style)
**Modality Effect:** A linear mixed-effects model revealed a significant main effect of input modality on throughput, F(1, 1743.3) = 69.71, p < .001, η²p = 0.038 (small effect).
Hand input produced higher throughput (M = 3.51, 95% CI [3.36, 3.66] bits/s) than gaze input (M = 3.18, 95% CI [3.03, 3.33] bits/s).
**UI Mode Effect (Omnibus):** The main effect of UI mode was non-significant, F(1, 1726.8) = 1.83, p = 0.176, η²p = 0.001 (negligible effect). **Note:** This omnibus UI mode effect is diluted by the fact that HAND width inflation did not execute (width_scale_factor always 1), so UI mode is not interpretable as an adaptive manipulation for hand. The interpretable test of adaptation comes from the **gaze-only UI Mode × Pressure follow-up model** below.
#### Gaze-Only Follow-up: UI Mode × Pressure (Primary Adaptive Test)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ui_mode 1.20992 1.20992 1 838.12 2.0586 0.1517
pressure 0.77726 0.77726 1 836.92 1.3224 0.2505
ui_mode:pressure 0.36436 0.36436 1 841.02 0.6199 0.4313
**Estimated Marginal Means (Gaze-only):**
Table: Estimated Marginal Means for Throughput: Gaze-only
|UI Mode |Pressure | Mean TP (bits/s)| 95% CI Lower| 95% CI Upper|
|:--------|:--------|----------------:|------------:|------------:|
|Static |0 | 3.23| 3.03| 3.43|
|Adaptive |0 | 3.20| 3.00| 3.40|
|Static |1 | 3.21| 3.02| 3.41|
|Adaptive |1 | 3.10| 2.90| 3.30|
**Key Contrasts (Gaze-only, Holm-adjusted):**
|contrast | estimate| SE| df| t.ratio| p.value|
|:-------------------------------------|--------:|-----:|-------:|-------:|-------:|
|static pressure0 - adaptive pressure0 | 0.033| 0.072| 842.509| 0.453| 1.000|
|static pressure0 - adaptive pressure1 | 0.131| 0.072| 840.480| 1.814| 0.421|
|adaptive pressure0 - static pressure1 | -0.015| 0.072| 840.692| -0.204| 1.000|
|static pressure1 - adaptive pressure1 | 0.113| 0.072| 842.799| 1.564| 0.591|
#### Hand-Only Follow-up: Pressure Effect
*Note:* UI mode is excluded from hand models by design because width scaling did not execute.
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
pressure 1.0731 1.0731 1 818.48 1.6205 0.2034
**Estimated Marginal Means (Hand-only):**
Table: Estimated Marginal Means for Throughput: Hand-only (pressure effect)
|Pressure | Mean TP (bits/s)| 95% CI Lower| 95% CI Upper|
|:--------|----------------:|------------:|------------:|
|0 | 3.57| 3.43| 3.71|
|1 | 3.50| 3.36| 3.64|
**Pressure Contrast (Hand-only, Holm-adjusted):**
|contrast | estimate| SE| df| t.ratio| p.value|
|:---------------------|--------:|-----:|-------:|-------:|-------:|
|pressure0 - pressure1 | 0.07| 0.055| 819.347| 1.272| 0.204|
**Modality × UI Mode Interaction:** The interaction between modality and UI mode was non-significant, F(1, 1727.0) = 0.82, p = 0.366, η²p = 0.000 (negligible effect). This suggests that the effect of UI mode did not differ significantly between hand and gaze modalities.
#### Effect Size: Hand vs. Gaze (Collapsed Over UI Mode and Pressure)
Table: Estimated Marginal Means for Throughput by Modality (collapsed over UI mode and pressure)
|Modality | Mean TP (bits/s)| 95% CI Lower| 95% CI Upper|
|:--------|----------------:|------------:|------------:|
|Hand | 3.51| 3.36| 3.66|
|Gaze | 3.18| 3.03| 3.33|
**Difference (Hand - Gaze):** 0.32 bits/s
#### Pairwise Comparisons (Holm-adjusted)
Table: Pairwise Comparisons with Effect Sizes (Holm-adjusted p-values)
|contrast | estimate| SE| Cohen's d (approx)|Effect Size |p-value | df|
|:-------------------------------------------------|----------:|---------:|------------------:|:-----------|:-------|--------:|
|hand static pressure0 - gaze static pressure0 | 0.2962922| 0.0772791| 0.426|small |= 0.002 | 1738.000|
|hand static pressure0 - hand adaptive pressure0 | -0.0260350| 0.0773102| -0.037|negligible |= 1.000 | 1733.571|
|hand static pressure0 - gaze adaptive pressure0 | 0.3416609| 0.0768298| 0.494|small |< .001 | 1738.485|
|hand static pressure0 - hand static pressure1 | 0.0258562| 0.0772010| 0.037|negligible |= 1.000 | 1734.303|
|hand static pressure0 - gaze static pressure1 | 0.3094844| 0.0770863| 0.446|small |= 0.001 | 1739.357|
|hand static pressure0 - hand adaptive pressure1 | 0.0859997| 0.0771106| 0.124|negligible |= 1.000 | 1734.249|
|hand static pressure0 - gaze adaptive pressure1 | 0.4363065| 0.0773386| 0.627|medium |< .001 | 1737.768|
|gaze static pressure0 - hand adaptive pressure0 | -0.3223272| 0.0770623| -0.465|small |< .001 | 1738.545|
|gaze static pressure0 - gaze adaptive pressure0 | 0.0453687| 0.0758682| 0.066|negligible |= 1.000 | 1735.084|
|gaze static pressure0 - hand static pressure1 | -0.2704360| 0.0769477| -0.391|small |= 0.007 | 1739.248|
|gaze static pressure0 - gaze static pressure1 | 0.0131922| 0.0760477| 0.019|negligible |= 1.000 | 1735.250|
|gaze static pressure0 - hand adaptive pressure1 | -0.2102925| 0.0768566| -0.304|small |= 0.082 | 1739.212|
|gaze static pressure0 - gaze adaptive pressure1 | 0.1400143| 0.0763446| 0.204|small |= 0.802 | 1733.915|
|hand adaptive pressure0 - gaze adaptive pressure0 | 0.3676959| 0.0765456| 0.534|medium |< .001 | 1738.367|
|hand adaptive pressure0 - hand static pressure1 | 0.0518912| 0.0768864| 0.075|negligible |= 1.000 | 1733.776|
|hand adaptive pressure0 - gaze static pressure1 | 0.3355194| 0.0767695| 0.486|small |< .001 | 1738.908|
|hand adaptive pressure0 - hand adaptive pressure1 | 0.1120347| 0.0767962| 0.162|negligible |= 1.000 | 1733.728|
|hand adaptive pressure0 - gaze adaptive pressure1 | 0.4623415| 0.0771220| 0.666|medium |< .001 | 1738.313|
|gaze adaptive pressure0 - hand static pressure1 | -0.3158047| 0.0762918| -0.460|small |< .001 | 1737.543|
|gaze adaptive pressure0 - gaze static pressure1 | -0.0321765| 0.0754205| -0.047|negligible |= 1.000 | 1733.865|
4. Movement Time Analysis (Core Confirmatory)
Research Question: How does movement time vary across conditions?
This analysis is part of the core confirmatory battery for RQ1 and RQ3. Movement time is mathematically coupled with throughput (TP = ID/RT) and serves as a complementary performance metric.
Sample Size: N = 75 participants for hand modality (mouse users only), N = 81 participants for gaze modality (mouse + trackpad users).
Relationship to Throughput: The RT patterns mirror throughput: hand is faster than gaze. Adaptive vs static and pressure do not show robust main effects on movement time at this N, consistent with the TP results.
Summary Statistics
Movement Time (s) by Condition (N = 81 participants)
| hand |
static |
0 |
73 |
1961 |
1.086 |
0.425 |
1.007 |
| hand |
static |
1 |
75 |
2006 |
1.106 |
0.411 |
1.036 |
| hand |
adaptive |
0 |
74 |
1994 |
1.080 |
0.405 |
1.012 |
| hand |
adaptive |
1 |
75 |
2012 |
1.100 |
0.407 |
1.024 |
| gaze |
static |
0 |
78 |
1734 |
1.168 |
0.485 |
1.074 |
| gaze |
static |
1 |
78 |
1715 |
1.184 |
0.480 |
1.083 |
| gaze |
adaptive |
0 |
80 |
1782 |
1.235 |
0.579 |
1.093 |
| gaze |
adaptive |
1 |
78 |
1749 |
1.242 |
0.523 |
1.119 |
Statistical Model Results
Planned Sample Size & Power
The log-RT analysis uses the same 2×2×2 within-subjects design and random-intercept LMM as the throughput analysis. However, the HAND adaptive manipulation (width inflation) did not execute (width_scale_factor always 1), so UI mode is not interpretable as an adaptation for hand. The primary adaptive test is the gaze-only UI Mode × Pressure interaction. Because throughput and RT are mathematically coupled (TP = ID/RT), the sample-size logic is identical: N = 48 is sufficient for detecting dz ≈ 0.40–0.50 differences with ≈0.80 power, and N = 64 further strengthens power for smaller effects and interactions (Cohen, 1988). Trial-level modeling with many repeated observations per participant increases precision, but our power planning is intentionally conservative and based on participant-level effects rather than naïvely counting trials.
Note on unbalanced design: Same as throughput analysis: hand modality N=0 (mouse users only), gaze modality N=0 (0 mouse + 0 trackpad users). Type III ANOVA with sum-to-zero contrasts handles this appropriately (Fox & Weisberg, 2019).
Random Effects Structure: All mixed models in this report use a random intercept for participants (1 | pid), which is a conservative and stable baseline. We may test richer random-effects structures (e.g., (1 + modality | pid)) as a robustness check.
### Model: log_rt ~ modality * ui_mode * pressure + (1 | pid)
**Model Structure:** Random intercept only (participant-level random effects). The trial-level N (≈ 14953 ) should not be mistaken for independent units in discussions of power; the effective N for inference is the number of participants ( 81 ).
**Data Summary:** 81 participants, 14953 trials, 8 conditions, minimum 1715 trials per condition.
#### ANOVA Table
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
modality 15.2141 15.2141 1 14904 177.9518 < 2.2e-16 ***
ui_mode 1.2045 1.2045 1 14873 14.0890 0.0001750 ***
pressure 1.0602 1.0602 1 14874 12.4007 0.0004305 ***
modality:ui_mode 1.3524 1.3524 1 14873 15.8183 7.006e-05 ***
modality:pressure 0.0375 0.0375 1 14874 0.4385 0.5078801
ui_mode:pressure 0.0009 0.0009 1 14876 0.0100 0.9203074
modality:ui_mode:pressure 0.0036 0.0036 1 14874 0.0418 0.8379702
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#### Written Results (APA Style)
**Modality Effect:** A linear mixed-effects model on log-transformed movement time revealed a significant main effect of input modality, F(1, 14904.1) = 177.95, p < .001, η²p = 0.012 (small effect).
Hand input produced faster movement times (M = 6.950, 95% CI [6.908, 6.991] s) than gaze input (M = 7.015, 95% CI [6.973, 7.057] s).
**UI Mode Effect (Omnibus):** The main effect of UI mode on movement time was significant, F(1, 14873.2) = 14.09, p < .001, η²p = 0.001 (negligible effect). **Note:** This omnibus UI mode effect is diluted by the fact that HAND width inflation did not execute, so UI mode is not interpretable as an adaptive manipulation for hand. The interpretable test of adaptation comes from the **gaze-only UI Mode × Pressure follow-up model** below.
#### Gaze-Only Follow-up: UI Mode × Pressure (Primary Adaptive Test)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ui_mode 1.76884 1.76884 1 6902.6 21.4673 3.665e-06 ***
pressure 0.29054 0.29054 1 6900.2 3.5262 0.06045 .
ui_mode:pressure 0.00146 0.00146 1 6907.4 0.0178 0.89397
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
**Estimated Marginal Means (Gaze-only):**
Table: Estimated Marginal Means for Movement Time: Gaze-only
|UI Mode |Pressure | Mean RT (s)| 95% CI Lower| 95% CI Upper|
|:--------|:--------|-----------:|------------:|------------:|
|Static |0 | 6.987| 6.931| 7.044|
|Adaptive |0 | 7.019| 6.962| 7.075|
|Static |1 | 6.999| 6.943| 7.056|
|Adaptive |1 | 7.032| 6.976| 7.089|
**Key Contrasts (Gaze-only, Holm-adjusted):**
|contrast | estimate| SE| df| z.ratio| p.value|
|:-------------------------------------|--------:|----:|---:|-------:|-------:|
|static pressure0 - adaptive pressure0 | -0.031| 0.01| Inf| -3.186| 0.006|
|static pressure0 - adaptive pressure1 | -0.045| 0.01| Inf| -4.599| 0.000|
|adaptive pressure0 - static pressure1 | 0.019| 0.01| Inf| 1.958| 0.151|
|static pressure1 - adaptive pressure1 | -0.033| 0.01| Inf| -3.348| 0.004|
#### Hand-Only Follow-up: Pressure Effect
*Note:* UI mode is excluded from hand models by design because width scaling did not execute.
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
pressure 0.81498 0.81498 1 7905.1 10.277 0.001352 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
**Estimated Marginal Means (Hand-only):**
Table: Estimated Marginal Means for Movement Time: Hand-only (pressure effect)
|Pressure | Mean RT (s)| 95% CI Lower| 95% CI Upper|
|:--------|-----------:|------------:|------------:|
|0 | 6.932| 6.897| 6.968|
|1 | 6.953| 6.917| 6.988|
**Pressure Contrast (Hand-only, Holm-adjusted):**
|contrast | estimate| SE| df| z.ratio| p.value|
|:---------------------|--------:|-----:|---:|-------:|-------:|
|pressure0 - pressure1 | -0.02| 0.006| Inf| -3.206| 0.001|
**Modality × UI Mode Interaction:** The interaction was significant, F(1, 14873.3) = 15.82, p < .001, η²p = 0.001 (negligible effect). Follow-up simple effects analyses are recommended.
#### Pairwise Comparisons (Holm-adjusted)
contrast estimate SE df
hand static pressure0 - gaze static pressure0 -0.05063470 0.009704014 Inf
hand static pressure0 - hand adaptive pressure0 -0.00038816 0.009307398 Inf
hand static pressure0 - gaze adaptive pressure0 -0.08723914 0.009647232 Inf
hand static pressure0 - hand static pressure1 -0.02154994 0.009303428 Inf
hand static pressure0 - gaze static pressure1 -0.06386266 0.009752761 Inf
hand static pressure0 - hand adaptive pressure1 -0.01901221 0.009293782 Inf
hand static pressure0 - gaze adaptive pressure1 -0.10146835 0.009685099 Inf
gaze static pressure0 - hand adaptive pressure0 0.05024653 0.009671825 Inf
gaze static pressure0 - gaze adaptive pressure0 -0.03660444 0.009891420 Inf
gaze static pressure0 - hand static pressure1 0.02908476 0.009666402 Inf
gaze static pressure0 - gaze static pressure1 -0.01322796 0.009987095 Inf
gaze static pressure0 - hand adaptive pressure1 0.03162249 0.009659086 Inf
gaze static pressure0 - gaze adaptive pressure1 -0.05083365 0.009928210 Inf
hand adaptive pressure0 - gaze adaptive pressure0 -0.08685098 0.009606228 Inf
hand adaptive pressure0 - hand static pressure1 -0.02116177 0.009257010 Inf
hand adaptive pressure0 - gaze static pressure1 -0.06347449 0.009711027 Inf
hand adaptive pressure0 - hand adaptive pressure1 -0.01862405 0.009247490 Inf
hand adaptive pressure0 - gaze adaptive pressure1 -0.10108019 0.009654203 Inf
gaze adaptive pressure0 - hand static pressure1 0.06568920 0.009579917 Inf
gaze adaptive pressure0 - gaze static pressure1 0.02337648 0.009903533 Inf
z.ratio p.value
-5.218 <0.0001
-0.042 1.0000
-9.043 <0.0001
-2.316 0.1643
-6.548 <0.0001
-2.046 0.2447
-10.477 <0.0001
5.195 <0.0001
-3.701 0.0026
3.009 0.0262
-1.325 0.5968
3.274 0.0117
-5.120 <0.0001
-9.041 <0.0001
-2.286 0.1643
-6.536 <0.0001
-2.014 0.2447
-10.470 <0.0001
6.857 <0.0001
2.360 0.1643
Degrees-of-freedom method: asymptotic
P value adjustment: holm method for 28 tests
5. Fitts’ Law Modelling
Research Question: How well does the data fit Fitts’ Law? (Linearity check).
Planned Sample Size & Power
Fitts’ law analyses serve primarily to validate the pointing task and modality differences, not to test the core adaptation hypotheses. The ID effect on movement time is typically very large (R² > .70), and robust Fitts-law slopes are observable with as few as 10–20 participants in classic HCI work. In this study, any final sample N ≥ 30 is more than sufficient for stable ID slopes; our planned N = 48 places this analysis in an over-powered, descriptive regime. We therefore do not perform formal power calculations here and treat Fitts regression as a manipulation check and descriptive characterization of the dataset.
Sample Size: N = 75 participants for hand modality (mouse users only), N = 81 participants for gaze modality (mouse + trackpad users).
Flatter slopes indicate less sensitivity to difficulty (ballistic movement).
Linear Regression: MT ~ IDe (N = 81 participants)
| hand |
static |
0.492 |
0.152 |
0.505 |
| hand |
adaptive |
0.470 |
0.142 |
0.544 |
| gaze |
static |
0.299 |
0.173 |
0.558 |
| gaze |
adaptive |
0.263 |
0.190 |
0.548 |
6. Error Rate Analysis (Core Confirmatory)
Research Question: How do error rates differ across conditions?
This analysis is part of the core confirmatory battery for RQ1 and RQ3.
Sample Size: N = 75 participants for hand modality (mouse users only), N = 81 participants for gaze modality (mouse + trackpad users).
Error Rates by Condition (N = 81 participants)
|
modality
|
ui_mode
|
pressure
|
Participants
|
Mean_Error_Rate
|
SD_Error_Rate
|
|
hand
|
static
|
0
|
73
|
1.88
|
4.83
|
|
hand
|
static
|
1
|
75
|
1.58
|
4.05
|
|
hand
|
adaptive
|
0
|
74
|
1.55
|
4.71
|
|
hand
|
adaptive
|
1
|
75
|
1.98
|
4.69
|
|
gaze
|
static
|
0
|
78
|
18.45
|
13.92
|
|
gaze
|
static
|
1
|
78
|
19.65
|
10.81
|
|
gaze
|
adaptive
|
0
|
80
|
18.40
|
12.86
|
|
gaze
|
adaptive
|
1
|
78
|
18.09
|
12.61
|
Error Rate by Modality and UI Mode (participant-level means). N = 81 participants. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons.
**Error Rate Summary:** Overall error rate was 10.5 %. Errors were concentrated in gaze conditions ( 18.8 %), while hand remained near 1.7 %.
Statistical Model Results
Planned Sample Size & Power
For the error-rate analysis we fit a binomial GLMM with random intercepts per participant. However, the HAND adaptive manipulation (width inflation) did not execute (width_scale_factor always 1), so UI mode is not interpretable as an adaptation for hand. The primary adaptive test is the gaze-only UI Mode × Pressure interaction. We expect odds-ratio effects in the small-to-medium range (e.g., OR ≈ 0.7–0.8 for adaptive vs static in gaze, and OR ≈ 2–3 for gaze vs hand). Binary outcomes with relatively low error rates (≈10–15%) typically require more participants than continuous outcomes for stable mixed-effects estimation (Kumle et al., 2021). For this analysis, we therefore treat N = 64 as a “good” target that yields comfortable power for medium effects, while N = 48 remains adequate but somewhat less stable, especially for interaction terms and rare error types. Error-based interaction effects are interpreted as exploratory, even at N = 64.
### Model: error ~ modality * ui_mode * pressure + (1 | pid)
**Model Structure:** Random intercept only (participant-level random effects). The trial-level N (≈ 16711 ) should not be mistaken for independent units in discussions of power; the effective N for inference is the number of participants ( 81 ).
**Data Summary:** 81 participants, 16711 trials, 8 conditions, minimum 1998 trials per condition.
**Overall Error Rate:** 10.5 %
#### ANOVA Table (Type III for unbalanced design)
Analysis of Deviance Table (Type III Wald chisquare tests)
Response: error
Chisq Df Pr(>Chisq)
(Intercept) 1003.8308 1 <2e-16 ***
modality 865.9770 1 <2e-16 ***
ui_mode 0.2190 1 0.6398
pressure 0.2232 1 0.6366
modality:ui_mode 0.2895 1 0.5906
modality:pressure 0.0014 1 0.9705
ui_mode:pressure 0.5398 1 0.4625
modality:ui_mode:pressure 2.5821 1 0.1081
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#### Pairwise Comparisons (Omnibus, Holm-adjusted)
*Note:* Omnibus UI mode effects are diluted by hand non-manipulation. See gaze-only follow-up below.
contrast odds.ratio SE df null
hand static pressure0 / gaze static pressure0 0.072752 0.012976 Inf 1
hand static pressure0 / hand adaptive pressure0 1.230786 0.305880 Inf 1
hand static pressure0 / gaze adaptive pressure0 0.073652 0.013129 Inf 1
hand static pressure0 / hand static pressure1 1.190374 0.293420 Inf 1
hand static pressure0 / gaze static pressure1 0.064129 0.011392 Inf 1
hand static pressure0 / hand adaptive pressure1 0.954876 0.222743 Inf 1
hand static pressure0 / gaze adaptive pressure1 0.076127 0.013581 Inf 1
gaze static pressure0 / hand adaptive pressure0 16.917569 3.259856 Inf 1
gaze static pressure0 / gaze adaptive pressure0 1.012366 0.082995 Inf 1
gaze static pressure0 / hand static pressure1 16.362097 3.109667 Inf 1
gaze static pressure0 / gaze static pressure1 0.881475 0.071005 Inf 1
gaze static pressure0 / hand adaptive pressure1 13.125087 2.264219 Inf 1
gaze static pressure0 / gaze adaptive pressure1 1.046398 0.086192 Inf 1
hand adaptive pressure0 / gaze adaptive pressure0 0.059841 0.011516 Inf 1
hand adaptive pressure0 / hand static pressure1 0.967166 0.248472 Inf 1
hand adaptive pressure0 / gaze static pressure1 0.052104 0.009998 Inf 1
hand adaptive pressure0 / hand adaptive pressure1 0.775826 0.189496 Inf 1
hand adaptive pressure0 / gaze adaptive pressure1 0.061853 0.011921 Inf 1
gaze adaptive pressure0 / hand static pressure1 16.162232 3.065616 Inf 1
gaze adaptive pressure0 / gaze static pressure1 0.870708 0.069299 Inf 1
z.ratio p.value
-14.694 <0.0001
0.836 1.0000
-14.633 <0.0001
0.707 1.0000
-15.463 <0.0001
-0.198 1.0000
-14.436 <0.0001
14.678 <0.0001
0.150 1.0000
14.706 <0.0001
-1.566 1.0000
14.924 <0.0001
0.551 1.0000
-14.633 <0.0001
-0.130 1.0000
-15.398 <0.0001
-1.039 1.0000
-14.440 <0.0001
14.671 <0.0001
-1.740 0.9013
P value adjustment: holm method for 28 tests
Tests are performed on the log odds ratio scale
#### Gaze-Only Follow-up: UI Mode × Pressure (Primary Adaptive Test)
**ANOVA (Type III):**
Analysis of Deviance Table (Type III Wald chisquare tests)
Response: error
Chisq Df Pr(>Chisq)
(Intercept) 360.4270 1 <2e-16 ***
ui_mode 2.4509 1 0.1175
pressure 0.7239 1 0.3949
ui_mode:pressure 2.0557 1 0.1516
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
**Estimated Marginal Means (Gaze-only, response scale):**
Table: Estimated Marginal Means for Error Rate: Gaze-only (UI Mode × Pressure)
|UI Mode |Pressure | Mean Error Rate| 95% CI Lower| 95% CI Upper|
|:--------|:--------|---------------:|------------:|------------:|
|Static |0 | 16.6| 14.2| 19.5|
|Adaptive |0 | 16.6| 14.1| 19.4|
|Static |1 | 18.5| 15.9| 21.6|
|Adaptive |1 | 16.1| 13.7| 18.9|
**Key Contrasts (Gaze-only, Holm-adjusted, odds ratio scale):**
|contrast | odds.ratio| SE| df| null| z.ratio| p.value|
|:-------------------------------------|----------:|-----:|---:|----:|-------:|-------:|
|static pressure0 / adaptive pressure0 | 1.007| 0.082| Inf| 1| 0.082| 1.000|
|static pressure0 / adaptive pressure1 | 1.042| 0.085| Inf| 1| 0.497| 1.000|
|adaptive pressure0 / static pressure1 | 0.871| 0.069| Inf| 1| -1.738| 0.411|
|static pressure1 / adaptive pressure1 | 1.187| 0.095| Inf| 1| 2.136| 0.196|
#### Hand-Only Follow-up: Pressure Effect
*Note:* UI mode is excluded from hand models by design because width scaling did not execute.
**ANOVA (Type III):**
Analysis of Deviance Table (Type III Wald chisquare tests)
Response: error
Chisq Df Pr(>Chisq)
(Intercept) 59.3148 1 1.344e-14 ***
pressure 0.0429 1 0.836
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
**Estimated Marginal Means (Hand-only, response scale):**
Table: Estimated Marginal Means for Error Rate: Hand-only (pressure effect)
|Pressure | Mean Error Rate| 95% CI Lower| 95% CI Upper|
|:--------|---------------:|------------:|------------:|
|0 | 0.1| 0| 0.5|
|1 | 0.1| 0| 0.5|
**Pressure Contrast (Hand-only, Holm-adjusted, odds ratio scale):**
|contrast | odds.ratio| SE| df| null| z.ratio| p.value|
|:---------------------|----------:|-----:|---:|----:|-------:|-------:|
|pressure0 / pressure1 | 0.964| 0.173| Inf| 1| -0.207| 0.836|
7. Accuracy & Gaze Dynamics
Sample Size: N = 75 participants for hand modality (mouse users only), N = 81 participants for gaze modality (mouse + trackpad users).
Effective Width (\(W_e\))
Planned Sample Size & Power
Effective width (We) is analyzed at the participant × condition level with a Gaussian LMM. We expect medium effects of modality (gaze > hand) and small-to-medium effects of UI mode (adaptive slightly improving spatial precision). For within-subject effects of this magnitude, N ≈ 48 is sufficient for ≈0.80 power (dz ≈ 0.4–0.5) according to standard repeated-measures power guidelines (Cohen, 1988). We therefore treat N = 48 as a good target for We, with N = 64 mainly helping if UI-mode effects turn out closer to dz ≈ 0.3.
Lower \(W_e\) indicates tighter shot grouping (higher precision).
Effective Width (px) by Condition (N = 81 participants)
| hand |
static |
0 |
73 |
33.68 |
20.76 |
| hand |
static |
1 |
75 |
33.13 |
20.49 |
| hand |
adaptive |
0 |
74 |
33.46 |
20.98 |
| hand |
adaptive |
1 |
75 |
34.75 |
21.45 |
| gaze |
static |
0 |
77 |
35.84 |
19.60 |
| gaze |
static |
1 |
78 |
35.72 |
19.77 |
| gaze |
adaptive |
0 |
80 |
34.76 |
18.93 |
| gaze |
adaptive |
1 |
78 |
36.42 |
20.10 |
Effective target width was broadly similar between Static and Adaptive within each modality; gaze showed slightly larger We overall, consistent with higher variability in endpoint location.
Statistical Analysis: Effective Width
### ANOVA: Effective Width (Type III)
Table: Mixed-effects model: We ~ Modality × UI Mode × Pressure (N = 81 participants)
| | Sum Sq| Mean Sq| NumDF| DenDF| F value| Pr(>F)|
|:-------------------------|---------:|---------:|-----:|--------:|-------:|------:|
|modality | 1703.1717| 1703.1717| 1| 1783.168| 4.1869| 0.0409|
|ui_mode | 29.8092| 29.8092| 1| 1726.387| 0.0733| 0.7867|
|pressure | 137.8330| 137.8330| 1| 1729.079| 0.3388| 0.5606|
|modality:ui_mode | 85.6434| 85.6434| 1| 1727.324| 0.2105| 0.6464|
|modality:pressure | 16.6078| 16.6078| 1| 1728.095| 0.0408| 0.8399|
|ui_mode:pressure | 381.2582| 381.2582| 1| 1735.188| 0.9372| 0.3331|
|modality:ui_mode:pressure | 0.0094| 0.0094| 1| 1728.993| 0.0000| 0.9962|
**Modality effect:**
**UI mode effect:**
**Modality × UI mode interaction:**
### Estimated Marginal Means (by Modality × Pressure)
Table: Effective Width (px) by Modality, UI Mode, and Pressure
|ui_mode |modality |pressure | emmean| SE| df| lower.CL| upper.CL|
|:--------|:--------|:--------|------:|----:|-------:|--------:|--------:|
|static |hand |0 | 33.69| 1.38| 1398.11| 30.98| 36.41|
|adaptive |hand |0 | 33.46| 1.37| 1395.54| 30.77| 36.16|
|static |gaze |0 | 35.88| 1.36| 1402.92| 33.22| 38.54|
|adaptive |gaze |0 | 34.79| 1.34| 1403.27| 32.16| 37.42|
|static |hand |1 | 33.13| 1.37| 1396.20| 30.45| 35.81|
|adaptive |hand |1 | 34.75| 1.36| 1393.31| 32.07| 37.42|
|static |gaze |1 | 35.71| 1.35| 1398.17| 33.07| 38.35|
|adaptive |gaze |1 | 36.45| 1.36| 1411.59| 33.78| 39.11|
### Pairwise Comparisons (Holm-adjusted)
Table: UI Mode comparisons within each Modality
|contrast | estimate| SE| t-ratio|p-value | df|
|:-----------------|--------:|-----:|-------:|:-------|--------:|
|static - adaptive | 0.231| 1.921| 0.12|= 0.904 | 1723.504|
|static - adaptive | 1.093| 1.882| 0.58|= 0.561 | 1734.248|
|static - adaptive | -1.616| 1.904| -0.85|= 0.396 | 1720.903|
|static - adaptive | -0.736| 1.888| -0.39|= 0.697 | 1735.628|
**APA-formatted summary (omnibus):** No significant differences in effective width between UI modes within modalities (all p > 0.05).
#### Gaze-Only Follow-up: UI Mode × Pressure
**Estimated Marginal Means (Gaze-only):**
Table: Effective Width: Gaze-only (UI Mode × Pressure)
|UI Mode |Pressure | Mean We (px)|
|:--------|:--------|------------:|
|Static |0 | 35.84|
|Adaptive |0 | 34.76|
|Static |1 | 35.72|
|Adaptive |1 | 36.42|
**Key Contrasts (Gaze-only, Holm-adjusted):**
|contrast | estimate| SE| df| t.ratio| p.value|
|:-------------------------------------|--------:|-----:|-------:|-------:|-------:|
|static pressure0 - adaptive pressure0 | 1.080| 1.828| 849.096| 0.591| 1|
|static pressure0 - adaptive pressure1 | -0.580| 1.842| 844.157| -0.315| 1|
|adaptive pressure0 - static pressure1 | -0.961| 1.820| 842.471| -0.528| 1|
|static pressure1 - adaptive pressure1 | -0.700| 1.834| 850.539| -0.381| 1|
#### Hand-Only Follow-up: Pressure Effect
*Note:* UI mode is excluded from hand models by design because width scaling did not execute.
**Estimated Marginal Means (Hand-only):**
Table: Effective Width: Hand-only (pressure effect)
|Pressure | Mean We (px)|
|:--------|------------:|
|0 | 33.57|
|1 | 33.94|
**Pressure Contrast (Hand-only, Holm-adjusted):**
|contrast | estimate| SE| df| t.ratio| p.value|
|:---------------------|--------:|-----:|-------:|-------:|-------:|
|pressure0 - pressure1 | -0.368| 1.402| 821.242| -0.263| 0.793|
Endpoint Accuracy Scatter Plot
Visualization of endpoint errors relative to target center. Each point represents one trial’s endpoint position.
Endpoint Error Distance (px) for Gaze Modality
| static |
0 |
1734 |
11.45 |
7.81 |
9.37 |
| static |
1 |
1715 |
11.75 |
7.85 |
9.73 |
| adaptive |
0 |
1782 |
11.66 |
7.96 |
9.53 |
| adaptive |
1 |
1749 |
11.77 |
8.04 |
9.78 |
The “Midas Touch” Struggle
Planned Sample Size & Power
Target re-entries are count-like and somewhat noisy, but we again analyze participant-level averages with an LMM (or, if needed, a Poisson GLMM). We anticipate medium modality effects (more re-entries for gaze) and small-to-medium UI-mode effects (fewer re-entries under adaptation). Given the noisier nature of this metric, a slightly larger sample is desirable if you want to treat it as confirmatory. We therefore treat N = 48 as adequate but exploratory and N = 64 as a “good” sample size for detecting medium within-subject effects in re-entry counts. Power reasoning follows the same logic as other continuous repeated-measures outcomes, tempered by mixed-model guidance from Kumle et al. (2021).
Target Re-entries measure how often the cursor drifted out of the target before selection.
Re-entries are interpreted here as a proxy for control stability; higher counts suggest more corrective movements. We will revisit this metric in the control-theory analyses (Section 10).
Target Re-entries by Condition (N = 81 participants)
| hand |
static |
0 |
73 |
0.86 |
0.55 |
| hand |
static |
1 |
75 |
0.85 |
0.55 |
| hand |
adaptive |
0 |
74 |
0.87 |
0.55 |
| hand |
adaptive |
1 |
75 |
0.85 |
0.57 |
| gaze |
static |
0 |
77 |
2.19 |
1.34 |
| gaze |
static |
1 |
78 |
2.19 |
1.33 |
| gaze |
adaptive |
0 |
80 |
2.26 |
1.45 |
| gaze |
adaptive |
1 |
78 |
2.18 |
1.32 |
8. Workload (NASA-TLX) (Core Confirmatory)
Subjective workload scores (lower is better).
Research Question: How does subjective workload differ across conditions? Does Adaptive UI reduce workload?
This analysis is part of the core confirmatory battery for RQ2 and RQ3.
Metric Definition: We use the unweighted NASA-TLX, computed as the mean of the six subscales (Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, Frustration). Each subscale is rated on a 0-100 scale, and the overall TLX score is the arithmetic mean of all six subscales. Lower values indicate lower subjective workload.
Sample Size: N = 75 participants for hand modality (mouse users only), N = 75 participants for gaze modality (mouse + trackpad users).
Statistical Model: Overall TLX
Planned Sample Size & Power
NASA-TLX scores (overall and subscales) are collected at the block level and analyzed with an LMM (random intercepts per participant; fixed effects for modality and UI mode). TLX scores tend to be reasonably reliable, and we expect medium effects for both modality (gaze > hand) and UI mode (adaptive < static), especially on Physical Demand and Frustration. For within-subject designs with medium effects, ≈40–50 participants typically provide ≥0.80 power (Brysbaert, 2019). We therefore treat N = 48 as a good, pre-planned N for TLX analyses. An increase to N = 64 would mostly refine confidence intervals and interaction estimates rather than change the main power conclusions.
Note on unbalanced design: Same as other analyses: hand modality N=70 (mouse users only), gaze modality N=75 (70 mouse + 5 trackpad users). Type III ANOVA with sum-to-zero contrasts handles this appropriately (Fox & Weisberg, 2019).
Random Effects Structure: All mixed models in this report use a random intercept for participants (1 | pid), which is a conservative and stable baseline. We may test richer random-effects structures (e.g., (1 + modality | pid)) as a robustness check.
### Model: overall_tlx ~ modality * ui_mode + (1 | pid)
**Data Summary:** 81 participants, 763 observations.
#### ANOVA Table
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
modality 4346.2 4346.2 1 550.02 73.9854 <2e-16 ***
ui_mode 75.0 75.0 1 550.19 1.2770 0.2590
modality:ui_mode 118.4 118.4 1 550.10 2.0157 0.1562
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#### Written Results (APA Style)
**Modality Effect:** A linear mixed-effects model revealed a significant main effect of input modality on overall NASA-TLX workload, F(1, 550.0) = 73.99, p < .001, η²p = 0.119 (medium effect).
Gaze input produced higher workload (M = 46.5, 95% CI [43.2, 49.8]) than hand input (M = 41.1, 95% CI [37.8, 44.4]).
**UI Mode Effect (Omnibus):** The main effect of UI mode on workload was non-significant, F(1, 550.2) = 1.28, p = 0.259, η²p = 0.002 (negligible effect). **Note:** This omnibus UI mode effect is diluted by the fact that HAND width inflation did not execute, so UI mode is not interpretable as an adaptive manipulation for hand. The interpretable test of adaptation comes from the **gaze-only UI Mode follow-up model** below.
#### Gaze-Only Follow-up: UI Mode (Primary Adaptive Test)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ui_mode 170 170 1 240.66 3.6389 0.05763 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
**Estimated Marginal Means (Gaze-only):**
Table: Estimated Marginal Means for Overall TLX: Gaze-only
|UI Mode | Mean TLX| 95% CI Lower| 95% CI Upper|
|:--------|--------:|------------:|------------:|
|Static | 45.4| 42.1| 48.6|
|Adaptive | 46.9| 43.5| 50.2|
**Contrast (Gaze-only, Holm-adjusted):**
|contrast | estimate| SE| df| t.ratio| p.value|
|:-----------------|--------:|-----:|-------:|-------:|-------:|
|static - adaptive | -1.508| 0.792| 242.818| -1.903| 0.058|
**Modality × UI Mode Interaction:** The interaction was non-significant, F(1, 550.1) = 2.02, p = 0.156, η²p = 0.004 (negligible effect). The effect of UI mode on workload did not differ significantly between modalities.
#### Estimated Marginal Means (Overall TLX by Modality × UI Mode)
Table: Estimated Marginal Means for Overall TLX by Condition (95% CI)
|Modality |UI Mode | Mean TLX| 95% CI Lower| 95% CI Upper|
|:--------|:--------|--------:|------------:|------------:|
|Hand |Static | 41.2| 37.8| 44.5|
|Gaze |Static | 45.7| 42.3| 49.0|
|Hand |Adaptive | 41.0| 37.5| 44.4|
|Gaze |Adaptive | 47.3| 43.8| 50.7|
Advanced TLX Analysis: UX Insights
Research Questions: - Which subscales drive overall workload? Are there different workload profiles for hand vs. gaze? - Is there a performance-workload trade-off? Do participants who report lower workload perform better? - How do workload sources differ between modalities?
These analyses provide deeper UX insights into workload patterns and their relationship to performance.
### Workload Profile Analysis
**Dominant Workload Sources:**
### Workload-Performance Relationship
### Modality-Specific Workload Patterns
**Workload Differences: Hand vs. Gaze (by UI Mode)**
### Individual Differences in Workload
**Participants with Highest Average Workload (Top 5):**
**Participants with Lowest Average Workload (Bottom 5):**
**Workload Consistency Across Conditions:**
Participants with low SD report similar workload across all conditions.
Participants with high SD show large workload differences between conditions.
**Interpretation:**
- **Subscale contributions** reveal which aspects of workload are most prominent in each condition.
- **Workload-performance correlations** show whether lower workload is associated with better performance.
- **Workload efficiency** quantifies performance per unit of reported workload.
- **Modality differences** highlight which subscales are most affected by input modality.
- **Individual differences** identify participants who consistently report high/low workload.
9. Participant Awareness & Strategy (Debrief Analysis)
Research Question: Did participants notice the adaptive interface? Did they change their strategy? How do awareness and strategy relate to performance?
Sample Size: N = 41 participants with debrief responses.
This section analyzes post-experiment debrief responses to understand participant awareness of the adaptive interface and self-reported strategy changes.
**Debrief Response Coverage:** 41 participants provided debrief responses.
### Thematic Analysis of Debrief Responses
**Q1: Did you notice the interface adapting?**
**Sample Responses by Category:**
** Not Noticed :**
- "I noticed size differences, but no dynamic growth. I did not notice any text change."
- "I saw that the interface and targets changed, but I did not realize that it was a result of my performance."
- "No"
** Noticed Size Changes :**
- "sometimes they got bigger"
- "During Gaze Mode it was hard to track the target due to bloom (circles) appearing continuously as I tried to hit the targets."
- "I thought changes were random."
** Noticed :**
- "Yes"
- "Yes the space bar was harder to use."
** Other/Unclear :**
- "a little."
- "I had to scroll laterally slightly for some to fir the time box in the screen"
**Q2: Did you change your strategy during the experiment?**
**Sample Responses by Category:**
** No Strategy Change :**
- "Regardless of trial, I attempted to click the designated target as quickly as possible; no conscious change to technique."
- "No"
- "it was almost halfway through the experiment when I noticed the right side bar provides feedback on how fast or slow I'm going, and I began to peek a..."
** Focused on Speed :**
- "When the targets became easier, I focused less on accuracy and focused more on speed, as the larger targets required less focus and accuracy."
- "easy targets made me faster."
- "I went faster when the targets were bigger because I was less worried about making a mistake."
** Adapted to Easier Targets :**
- "I proceeded more slowly when the target shrank in size."
- "When targets became smaller, I would slow down because it's harder to get the placement inside a smaller radius."
- "When the target gets smaller, I pay more attention to clicking in the right spot."
** Strategy Changed :**
- "Yes"
- "yes, Instead of focusing on the red dots, I focused on pressing the space when the "press space" button showed up"
- "I changed my strategy to move cursor as fast as possible in space clicking tasks, because it was easier to use both hands instead of using just mouse ..."
** Other/Unclear :**
- "I had to slow up with the shaking dot."
** Focused on Accuracy :**
- "I did go a bit slower for smaller targets. I would move the mouse quickly to their general area then slow down and focus onto it"
### Relationship to Performance
**Performance by Adaptation Awareness:**
**Performance by Strategy Category:**
**Interpretation:**
- Participants who noticed adaptation may have different performance patterns.
- Strategy changes (e.g., focusing on speed vs. accuracy) may relate to performance outcomes.
- These relationships are exploratory and should be interpreted with caution due to self-report biases.
10. Learning Curves & Practice Effects
Research Question: How does performance change within each condition? Do learning rates differ by condition?
Sample Size: N = 81 participants with trial-level data.
Note: These learning curves serve as a quality check that participants improved modestly and reached a plateau; we do not treat these as primary inferential outcomes. This analysis is exploratory/QC only.
This section shows learning curves aligned by condition start (accounting for Williams counterbalancing). For block-level trends, see Section 12.
Learning Curve Data Summary by Condition (N = 81 participants)
| Hand |
Static |
OFF |
27 |
1.087 |
0.0188 |
| Hand |
Static |
ON |
27 |
1.106 |
0.0159 |
| Hand |
Adaptive |
OFF |
27 |
1.081 |
0.0155 |
| Hand |
Adaptive |
ON |
27 |
1.101 |
0.0198 |
| Gaze |
Static |
OFF |
27 |
1.169 |
0.1844 |
| Gaze |
Static |
ON |
27 |
1.182 |
0.2067 |
| Gaze |
Adaptive |
OFF |
27 |
1.235 |
0.1840 |
| Gaze |
Adaptive |
ON |
27 |
1.242 |
0.1809 |
Error Rate Summary by Condition
| Hand |
Static |
OFF |
27 |
1.88% |
0.00% |
4.11% |
| Hand |
Static |
ON |
27 |
1.59% |
0.00% |
4.05% |
| Hand |
Adaptive |
OFF |
27 |
1.55% |
0.00% |
5.41% |
| Hand |
Adaptive |
ON |
27 |
1.98% |
0.00% |
8.00% |
| Gaze |
Static |
OFF |
27 |
18.44% |
11.54% |
24.34% |
| Gaze |
Static |
ON |
27 |
20.67% |
10.13% |
30.38% |
| Gaze |
Adaptive |
OFF |
27 |
18.40% |
13.75% |
26.58% |
| Gaze |
Adaptive |
ON |
27 |
18.09% |
8.97% |
28.21% |
Note: Data aligned by position within condition to account for Williams counterbalancing. For block-level trends, see Section 12: Block Order & Temporal Effects.
11. Movement Quality Metrics
Submovement Analysis
Research Question: Does adaptive UI reduce movement corrections? How do submovements relate to performance?
Submovements indicate intermittent control - fewer submovements suggest smoother, more ballistic movements.
Planned Sample Size & Power
Submovement count is a noisier movement-quality metric and is currently based on pre-computed peaks. We anticipate small-to-medium effects of UI mode (adaptive reducing corrective movements) and medium effects of modality, but with considerable between-participant variability. For such count-based metrics, simulation-based power analysis is strongly recommended (e.g., using the approach in Kumle et al., 2021). As a rule of thumb, N = 64–72 would be needed to treat submovement differences as confirmatory (especially for UI-mode effects), whereas N = 48 is more appropriate for exploratory visualization and effect-size estimation rather than strict NHST.
Data Availability Note: Submovement metrics are available for a subset of the sample (see counts below). Results in this section are descriptive engineering diagnostics. - Participants with submovement_count (legacy, pre-computed): N = N = 3 - Participants with submovement_count_recomputed (from trajectory data): N = N = 71 - Participants with full trajectory JSON data: N = N = 71
Submovement Count by Condition (N = 71 participants, using submovement_count_recomputed)
| hand |
static |
0 |
63 |
1715 |
0.00 |
0.00 |
0 |
| hand |
static |
1 |
65 |
1748 |
0.00 |
0.00 |
0 |
| hand |
adaptive |
0 |
64 |
1738 |
0.00 |
0.00 |
0 |
| hand |
adaptive |
1 |
65 |
1758 |
0.00 |
0.00 |
0 |
| gaze |
static |
0 |
69 |
1545 |
8.51 |
4.76 |
8 |
| gaze |
static |
1 |
68 |
1501 |
8.68 |
4.62 |
8 |
| gaze |
adaptive |
0 |
70 |
1572 |
8.85 |
5.26 |
8 |
| gaze |
adaptive |
1 |
68 |
1540 |
9.06 |
5.04 |
8 |
ℹ **Note:** Hand modality shows zero submovements (smooth movements). Plot shows gaze modality only.
ℹ **Note:** Hand modality shows zero submovements, indicating very smooth movements
with no detected velocity peaks (submovements). This is valid data.
ℹ **Note:** Hand modality shows zero submovements. Plot shows gaze modality only.
Verification Time Analysis
Research Question: How much time is spent “stopping” vs. “moving”? Does adaptive UI reduce verification time?
Sample Size: N = 81 participants with verification time data.
Planned Sample Size & Power
Verification time (from first target entry to final selection) is conceptually closer to a decision-phase measure and serves as a bridge to future LBA modeling. We again expect medium modality effects and small-to-medium UI-mode effects, and we analyze it via an LMM. Because this outcome is continuous and based on many trials per participant, N = 48 is a good target for medium effects, and N = 64 provides added stability for smaller UI-mode differences or more complex interaction patterns. The same repeated-measures power guidelines apply as for RT and TP (Cohen, 1988).
Verification time represents the “precise stopping” phase, separate from the ballistic movement phase.
### Verification Phase Decomposition
Confirmation Event Source by Condition. What triggered the final confirmation?
| hand |
static |
0 |
click |
1458 |
12.1 |
| hand |
static |
1 |
click |
1469 |
12.2 |
| hand |
adaptive |
0 |
click |
1485 |
12.3 |
| hand |
adaptive |
1 |
click |
1483 |
12.3 |
| gaze |
static |
0 |
space |
1545 |
12.8 |
| gaze |
static |
1 |
space |
1501 |
12.5 |
| gaze |
adaptive |
0 |
space |
1572 |
13.0 |
| gaze |
adaptive |
1 |
space |
1540 |
12.8 |
12. Error Patterns & Types
Research Question: What types of errors occur? Do error patterns differ by condition?
Sample Size: N = 81 participants with error type data.
**Error Type Summary:** Overall error rates were 19 % for gaze and 1.7 % for hand. Error patterns differed substantially by modality: gaze errors were predominantly slips ( 99 %), while hand errors were predominantly misses ( 93.1 %). This pattern is consistent with the modality characteristics—gaze is more prone to accidental selections (slips) due to the Midas touch problem, while hand pointing is more prone to missing targets. Adaptive UI did not yet show a clear reduction in any specific error type at N= 81 .
13. Block Order & Temporal Effects
Research Question: Are there order effects? Does performance improve or degrade over blocks?
Sample Size: N = 81 participants with block-level data.
Note: This section is exploratory/QC only. These analyses serve as quality checks for temporal trends and are not treated as primary inferential outcomes.
Block-Level Data Summary by Condition
|
Modality
|
UI Mode
|
Pressure
|
N Blocks
|
Mean Error Rate
|
|
Hand
|
Static
|
OFF
|
8
|
1.81%
|
|
Hand
|
Static
|
ON
|
8
|
1.44%
|
|
Hand
|
Adaptive
|
OFF
|
8
|
1.38%
|
|
Hand
|
Adaptive
|
ON
|
8
|
1.90%
|
|
Gaze
|
Static
|
OFF
|
8
|
18.27%
|
|
Gaze
|
Static
|
ON
|
8
|
20.71%
|
|
Gaze
|
Adaptive
|
OFF
|
8
|
18.66%
|
|
Gaze
|
Adaptive
|
ON
|
8
|
18.55%
|
Performance Across Blocks: Movement Time. Movement time by block number. Lower is better. Shaded regions show ±1 SE.
14. Spatial Patterns & Heatmaps
Research Question: Are there spatial biases in performance? Do some screen regions show better/worse performance? Do error patterns differ between conditions?
Sample Size: N = 81 participants with spatial position data.
Note: This section includes both descriptive visualizations and inferential statistical tests. At N=81, spatial analyses provide insights into XR-specific patterns (e.g., top vs bottom of visual field) and condition differences.
Error Density Heatmap
Where do endpoint errors occur? Are there systematic spatial biases?
### Statistical Tests: Error Density Differences
**Error Distance (Magnitude) Comparison:**
Table: ANOVA: Error Distance ~ UI Mode
| | Sum Sq| Mean Sq| NumDF| DenDF| F value| Pr(>F)|
|:-------|------:|-------:|-----:|--------:|-------:|------:|
|ui_mode | 0.0444| 0.0444| 1| 6859.323| 0.1156| 0.7339|
**Estimated Marginal Means (Error Distance, px):**
|ui_mode | response| SE| df| asymp.LCL| asymp.UCL|
|:--------|--------:|-----:|---:|---------:|---------:|
|static | 9.378| 0.200| Inf| 8.993| 9.779|
|adaptive | 9.431| 0.201| Inf| 9.045| 9.832|
⚠ Could not fit error distance model: ℹ In argument: `t-ratio = round(t.ratio, 2)`.
Caused by error:
! object 't.ratio' not found
**Error Bias in X-Direction:**
Table: ANOVA: Error X ~ UI Mode
| | Sum Sq| Mean Sq| NumDF| DenDF| F value| Pr(>F)|
|:-------|-------:|-------:|-----:|-------:|-------:|------:|
|ui_mode | 28.7645| 28.7645| 1| 6882.32| 0.3012| 0.5831|
**Estimated Marginal Means (Error X, px):**
|ui_mode | emmean| SE| df| asymp.LCL| asymp.UCL|
|:--------|------:|-----:|---:|---------:|---------:|
|static | 0.546| 0.208| Inf| 0.138| 0.955|
|adaptive | 0.417| 0.207| Inf| 0.011| 0.822|
**X-direction bias:**
**Error Bias in Y-Direction:**
Table: ANOVA: Error Y ~ UI Mode
| | Sum Sq| Mean Sq| NumDF| DenDF| F value| Pr(>F)|
|:-------|-------:|-------:|-----:|-------:|-------:|------:|
|ui_mode | 71.9155| 71.9155| 1| 6875.42| 0.8379| 0.36|
**Estimated Marginal Means (Error Y, px):**
|ui_mode | emmean| SE| df| asymp.LCL| asymp.UCL|
|:--------|------:|-----:|---:|---------:|---------:|
|static | 1.367| 0.222| Inf| 0.932| 1.803|
|adaptive | 1.572| 0.221| Inf| 1.139| 2.005|
**Y-direction bias:**
**Kolmogorov-Smirnov Test: Error Distance Distributions**
D = 0.0101, p = 0.9945
○ No significant difference in distributions
**Note:** 2D spatial pattern comparisons (e.g., 2D KS test) would require specialized packages.
Current analysis focuses on univariate comparisons (distance, X-bias, Y-bias).
15. Adaptive UI Mechanism Analysis
Root-Cause Diagnostic: Width Scaling Non-Activation
Research Question: Why did hand width inflation (width_scale_factor) fail to activate? Was this due to strict thresholds/gates or a bug/misconfiguration?
### Diagnostic: Why Did Hand Width Scaling Not Activate?
**Trigger-related columns in df_raw:**
- adaptation_triggered
- timeout_triggered
- width_scale_factor
- alignment_gate_false_triggers
- debrief_q1_adaptation_noticed
**Trigger summary for HAND/Adaptive/Pressure ON:**
- adaptation_triggered :
|pid |unique_vals | n_non_na|
|:----|:-----------|--------:|
|P001 |FALSE | 27|
|P002 |FALSE | 27|
|P003 |FALSE | 27|
|P004 |FALSE | 27|
|P005 |FALSE | 27|
|P006 |FALSE | 27|
|P007 |FALSE | 27|
|P008 |FALSE | 27|
|P009 |FALSE | 27|
|P010 |FALSE | 27|
- timeout_triggered :
|pid |unique_vals | n_non_na|
|:----|:-----------|--------:|
|P001 |FALSE | 27|
|P004 |FALSE | 27|
|P005 |FALSE | 27|
|P006 |FALSE | 27|
|P011 |FALSE | 27|
|P012 |FALSE | 27|
|P013 |FALSE | 27|
|P014 |FALSE | 27|
|P016 |FALSE | 27|
|P017 |FALSE | 27|
- width_scale_factor :
|pid | mean_val| median_val| max_val| pct_nonzero|
|:----|--------:|----------:|-------:|-----------:|
|P001 | 1| 1| 1| 100|
|P002 | NaN| NA| -Inf| NaN|
|P003 | NaN| NA| -Inf| NaN|
|P004 | 1| 1| 1| 100|
|P005 | 1| 1| 1| 100|
|P006 | 1| 1| 1| 100|
|P007 | NaN| NA| -Inf| NaN|
|P008 | NaN| NA| -Inf| NaN|
|P009 | 1| 1| 1| 100|
|P010 | 1| 1| 1| 100|
- alignment_gate_false_triggers :
|pid | mean_val| median_val| max_val| pct_nonzero|
|:----|--------:|----------:|-------:|-----------:|
|P001 | 0.37| 0| 2| 33.33|
|P002 | NaN| NA| -Inf| NaN|
|P003 | NaN| NA| -Inf| NaN|
|P004 | 0.22| 0| 1| 22.22|
|P005 | 0.07| 0| 1| 7.41|
|P006 | NaN| NA| -Inf| NaN|
|P007 | NaN| NA| -Inf| NaN|
|P008 | NaN| NA| -Inf| NaN|
|P009 | NaN| NA| -Inf| NaN|
|P010 | NaN| NA| -Inf| NaN|
- debrief_q1_adaptation_noticed :
|pid |unique_vals | n_non_na|
|:----|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------:|
|P001 |I noticed size differences, but no dynamic growth. I did not notice any text change. | 27|
|P002 |I saw that the interface and targets changed, but I did not realize that it was a result of my performance. | 27|
|P003 |No | 27|
|P006 |sometimes they got bigger | 27|
|P007 |No | 27|
|P013 |I noticed the jitter and drift. At first, I thought this is a design flaw! Then I realized that this has been planted for a reason! | 27|
|P015 |I did not notice any change | 27|
|P016 |I noticed the targets change size over the trials, but did not realize they were changing in response to my performance. | 27|
|P019 |No | 27|
|P021 |I didn't really notice the interface change, specifically because I made mistakes or no, however I primarily focused on the gaze model (the jitter portion where I was aiming) to see if it changed or not, due to whether I made more mistakes. I was unable to figure out if it did change or not, though. One thing I noticed was that sometimes the input initially would tell me I would be using the hand mode, but then instead I would use the gaze mode and vice versa. | 27|
**Scaling following triggers:**
|adaptation_triggered |timeout_triggered | width_scale_factor| alignment_gate_false_triggers|debrief_q1_adaptation_noticed | n_trials| mean_width_scale| pct_scaled|
|:--------------------|:-----------------|------------------:|-----------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------:|----------------:|----------:|
|FALSE |FALSE | 1| 0|During Gaze Mode it was hard to track the target due to bloom (circles) appearing continuously as I tried to hit the targets. | 24| 1| 0|
|FALSE |FALSE | 1| 0|I constantly tried to get the objects. The objects get 2 different shapes and variable sizes. I tried to catch the objects. Sometimes, rectangular objects also moved away from mouse. | 23| 1| 0|
|FALSE |FALSE | 1| 0|I did not notice | 25| 1| 0|
|FALSE |FALSE | 1| 0|I did not notice the interface changing. | 27| 1| 0|
|FALSE |FALSE | 1| 0|I did notice that sometimes targets were inflated, but didn't figure out that the inflation would be related to my prior responses.
I did not notice any changes related to the decluttered interface. | 27| 1| 0|
|FALSE |FALSE | 1| 0|I didn't actually notice that the dot size for hand mode was connected to my performance. For gaze mode, I noticed an orange shape sometimes, which occasionally delayed my reaction time, but didn't notice the text dimming that was mentioned above. | 25| 1| 0|
|FALSE |FALSE | 1| 0|I didn't make that many mistakes, so I didn't feel the change as much... | 27| 1| 0|
|FALSE |FALSE | 1| 0|I didn't notice the interface changing but I did notice the start button shifting each time. | 25| 1| 0|
|FALSE |FALSE | 1| 0|I didn't really notice a pattern adapting to my performance - maybe it was because I wasn't really sure how I was doing until the end | 20| 1| 0|
|FALSE |FALSE | 1| 0|I didn't really notice the interface change, specifically because I made mistakes or no, however I primarily focused on the gaze model (the jitter portion where I was aiming) to see if it changed or not, due to whether I made more mistakes. I was unable to figure out if it did change or not, though. One thing I noticed was that sometimes the input initially would tell me I would be using the hand mode, but then instead I would use the gaze mode and vice versa. | 25| 1| 0|
Width Scaling (Target Size Adaptation)
Research Question: Does the adaptive UI dynamically change target sizes? How does width scaling relate to performance?
Sample Size: N = 74 participants with width scaling data.
Status: In the current dataset, the width scaling mechanism was disabled/misconfigured; all recorded width_scale_factor values equal 1.0. Results here serve as a template for future analysis once scaling is active.
The adaptive UI may scale target widths based on performance. This section examines whether and how target sizes are adjusted.
**Note:** No target width scaling was observed in this dataset.
All `width_scale_factor` values are 1.0 (no scaling applied).
This indicates that the adaptive policy did not trigger during data collection.
Possible reasons:
- Hysteresis gate threshold not met (requires N consecutive slow/error trials)
- Performance thresholds (RT p75, error burst) not exceeded
- Adaptive policy not properly configured or enabled
- Participants performed well enough that adaptation was not needed
Target Width Scaling by Condition (N = 74 participants, No Scaling Observed)
| hand |
static |
0 |
72 |
1971 |
1 |
0 |
0 |
0 |
0 |
| hand |
static |
1 |
74 |
2025 |
1 |
0 |
0 |
0 |
0 |
| hand |
adaptive |
0 |
73 |
1998 |
1 |
0 |
0 |
0 |
0 |
| hand |
adaptive |
1 |
74 |
2025 |
1 |
0 |
0 |
0 |
0 |
| gaze |
static |
0 |
72 |
1971 |
1 |
0 |
0 |
0 |
0 |
| gaze |
static |
1 |
73 |
1998 |
1 |
0 |
0 |
0 |
0 |
| gaze |
adaptive |
0 |
74 |
2025 |
1 |
0 |
0 |
0 |
0 |
| gaze |
adaptive |
1 |
71 |
1944 |
1 |
0 |
0 |
0 |
0 |
**Note:** Width scale factor plot is not shown because all values are 1.0 (no scaling occurred).
Showing a plot of constant values would not be informative. The adaptive policy did not trigger during data collection.
**Note:** Width scaling over time plot is not shown because all scale factors are 1.0 (no scaling occurred).
Showing a plot of constant values would not be informative. The adaptive policy did not trigger during the experiment.
**Note:** Width scale factor vs. performance plot is not shown because all width scale factors are 1.0 (no scaling occurred).
**Why this matters:** This plot would show whether larger targets (scale factor > 1.0) improve performance by reducing movement time.
However, since the adaptive policy did not trigger during data collection, all targets remained at their nominal size.
As a result, there is no variation in the width scale factor, making it impossible to assess the performance relationship.
**Possible reasons for no scaling:**
- Hysteresis gate threshold not met (requires N consecutive slow/error trials)
- Performance thresholds (RT p75, error burst) not exceeded
- Participants performed well enough that adaptation was not needed
- Adaptive policy not properly configured or enabled
Alignment Gate Metrics
Research Question: If alignment gates are used, how do they affect performance? How often are false triggers detected?
Alignment gates may be used to ensure proper cursor alignment before selection. This section examines their usage and effectiveness.
**Alignment Gate Interpretation:** False triggers were rare (mean = 0.06 per trial). Adaptive UI did not show a meaningful change in false trigger rate compared to Static.
ℹ **Note:** Gaze modality shows zero false triggers (alignment gates are hand-only). Plot shows hand modality only.
ℹ **Note:** No recovery time data for gaze modality.
This indicates the alignment gate always passed (no false triggers) for these trials.
ℹ **Note:** No mean recovery time data for gaze modality.
This indicates the alignment gate always passed (no false triggers) for these trials.
Task Type Analysis
Research Question: Are there different task types (point vs. drag)? How does performance differ across task types?
If the experiment includes different task types, this section examines performance differences.
Performance by Task Type
| drag |
hand |
static |
2664 |
1101.0 |
475.0 |
1.31 |
| drag |
hand |
adaptive |
2682 |
1096.7 |
466.0 |
1.53 |
| drag |
gaze |
static |
2553 |
1203.3 |
582.1 |
17.04 |
| drag |
gaze |
adaptive |
2592 |
1228.9 |
539.7 |
16.40 |
| point |
hand |
static |
1332 |
1085.5 |
435.9 |
1.35 |
| point |
hand |
adaptive |
1341 |
1094.1 |
442.1 |
1.12 |
| point |
gaze |
static |
1279 |
1194.2 |
541.5 |
16.73 |
| point |
gaze |
adaptive |
1289 |
1244.8 |
529.7 |
16.06 |
Planned Sample Size & Power
Path-length efficiency (actual path / straight-line amplitude) is analyzed at the trial level but interpreted as a within-subject continuous outcome, with expected medium modality differences (longer, less efficient paths for gaze) and small-to-medium UI-mode effects. We treat N = 48 as a reasonable “good N” for detecting medium effects (dz ≈ 0.4–0.5), and N = 64 as an ideal target if path efficiency becomes more central to the argument. At both Ns, this analysis is secondary to the core throughput and RT results.
Path Length and Efficiency Metrics by Condition
| hand |
static |
0 |
1624 |
715.5 |
371.2 |
2.28 |
0.535 |
344.3 |
1050.3 |
| hand |
static |
1 |
1657 |
717.2 |
373.0 |
2.28 |
0.536 |
344.2 |
1068.6 |
| hand |
adaptive |
0 |
1661 |
717.2 |
373.6 |
2.28 |
0.538 |
343.6 |
1063.9 |
| hand |
adaptive |
1 |
1666 |
715.4 |
371.8 |
2.27 |
0.535 |
343.6 |
1064.0 |
| gaze |
static |
0 |
1339 |
616.8 |
351.2 |
1.92 |
0.588 |
265.6 |
1175.5 |
| gaze |
static |
1 |
1332 |
644.0 |
355.0 |
1.98 |
0.575 |
289.0 |
1171.1 |
| gaze |
adaptive |
0 |
1358 |
632.5 |
351.3 |
1.95 |
0.583 |
281.2 |
1225.9 |
| gaze |
adaptive |
1 |
1343 |
681.9 |
356.2 |
2.07 |
0.554 |
325.7 |
1232.9 |
⚠ Cannot create ID bins: insufficient variation or invalid break points.
Skipping ID binning plot.
16. Gaze-Specific Analysis: Hover/Dwell Time
Research Question: How does hover/dwell time vary across gaze conditions? Does adaptive UI affect dwell time before confirmation?
Planned Sample Size & Power
Hover/dwell time is modeled only for gaze trials with fixed effects for UI mode and pressure. Because this shrinks the effective dataset and the expected UI-mode effects may be small-to-medium (dz ≈ 0.3–0.5), we treat this analysis as exploratory unless N ≥ 64. At N = 48, the study is adequately powered for medium effects but underpowered for smaller ones; at N = 64, we expect ≈0.80 power even if the UI-mode effect is closer to dz ≈ 0.35, based on standard repeated-measures calculations and mixed-model heuristics (Cohen, 1988; Kumle et al., 2021).
Sample Size: N = 0 (no data) participants with gaze hover/dwell data.
Hover/dwell time represents the duration the cursor remains in the target before confirmation in gaze trials. This metric is specific to gaze modality and reflects the “Midas touch” problem—the need for deliberate confirmation to avoid unintended selections.
⚠ No valid hover/dwell time data available for gaze trials.
Statistical Analysis: Hover/Dwell Time
⚠ Insufficient data for Hover/Dwell Time statistical tests.
- Hierarchical LBA (verification-time RTs) - see Section 16
- Control-theory kinematics (velocity profiles, submovement decomposition) - see Section 17
Implementation Notes: - LBA requires RT data from the verification phase (time from target entry to selection) - Model fitting can be done using RWiener or rtdists packages - Key parameters to estimate: drift rate (v), threshold (b), starting point (A), non-decision time (t0) - Hypothesis: Adaptive conditions should show lower threshold (b), indicating less caution needed
17. Linear Ballistic Accumulator (LBA) Analysis
Research Question: Can we model the verification phase (time from target entry to selection) using LBA parameters? Do adaptive conditions show different decision thresholds?
Linear Ballistic Accumulator models decompose reaction time into decision and non-decision components. For gaze-based interaction, we hypothesize that adaptive UI reduces decision threshold (b), indicating less caution needed when targets are easier to acquire.
Sample Size & Power
The hierarchical LBA analysis is run on verification-time RTs with parameters (v, b, A, t₀) varying by modality and UI mode. Power and parameter recovery in diffusion/accumulator models depend more on trials per participant than on sheer N, but group-level comparisons still require a sufficient number of participants. Studies on parameter recovery for DDM/LBA and related models generally recommend ≥100 trials per condition and at least 30–40 participants for stable hierarchical estimates. Our design (≈24 trials × 8 conditions ≈ 192 trials per participant) is strong on the trial side. For group-level parameter differences, a target of N ≥ 64 is advisable for narrower credible intervals on parameter contrasts.
**LBA Analysis Results**
Parameters estimated using hierarchical Bayesian LBA model (PyMC).
**LBA Parameters by Modality and UI Mode:**
Table: LBA Parameter Estimates
|Modality |UI_Mode | t0_mu| vc_base_mu| vc_slope_mu| gap_int_mu| gap_slope_mu| ve_mu|
|:--------|:--------|------:|----------:|-----------:|----------:|------------:|------:|
|hand |static | -0.565| -3.673| -1.846| -0.619| 0.098| -4.952|
|hand |adaptive | -0.588| -3.673| -1.846| -0.619| 0.098| -4.952|
|gaze |static | -0.591| -3.673| -1.846| -0.619| 0.098| -4.952|
|gaze |adaptive | -0.561| -3.673| -1.846| -0.619| 0.098| -4.952|
**Model Diagnostics:**
- MCMC trace saved to: `lba_trace.nc`
- Trace plots available: `lba_trace_plot.png`
- Parameter summary: `lba_parameters_summary.csv`
**Note:** Review trace plots and R-hat diagnostics to assess convergence.
**Parameter Interpretation:**
- **t0 (non-decision time):** Time for stimulus encoding and motor execution, varies by modality and UI mode
- **vc_base (drift rate base):** Baseline accumulation rate for correct responses
- **vc_slope (drift rate slope):** How drift rate changes with task difficulty (ID)
- **gap_int (threshold gap intercept):** Baseline decision threshold above start point
- **gap_slope (threshold gap slope):** How threshold changes with pressure (speed-accuracy tradeoff)
- **ve (error drift rate):** Accumulation rate for error responses
18. Control Theory Analysis: Submovement Models
Research Question: How does the control loop efficiency differ across conditions? Do adaptive interventions reduce movement corrections?
Sample Size & Power
Trajectory-based kinematic metrics (velocity profiles, jerk, normalized jerk, primary vs corrective phases) are rich but correlated and often noisier than basic RT/TP measures. Because they are derived from the same trial-level data, their within-subject effect sizes are likely small-to-medium, with substantial individual differences. For these analyses, N = 64 is a good target for stronger inferential claims about UI-mode improvements in movement smoothness or control-loop efficiency. As with LBA, simulation-based power analyses tailored to your specific metrics would be ideal but are beyond the scope of this report (Kumle et al., 2021).
Submovement metrics in this report include pre-computed submovement_count (see Section 10). Full trajectory-based control-theory models (jerk, duration-normalized jerk, primary vs corrective phases) can be implemented using trajectory logging data.
The Optimized Submovement Model [@meyer1988] posits that pointing movements are composed of a primary ballistic impulse followed by n corrective submovements. The Submovement Count (N_sub) serves as a proxy for the efficiency of the control loop. In gaze-based interaction, simulated lag and saccadic blindness force users into an intermittent control regime, theoretically increasing N_sub.
Power Analysis Summary: - N=64 target provides good power for medium main effects (dz≈0.41, power≈0.80) - Interactions may be underpowered unless large (treat as exploratory) - 60fps trajectory data improves measurement precision but doesn’t increase effective N - Key considerations: Use duration-normalized smoothness metrics, control for multiple comparisons (FDR), pre-specify outcomes - See POWER_ANALYSIS_EXPERT_RESPONSE.md for detailed recommendations
✅ **Trajectory Data Available**
**Note:** Full trajectory processing requires parsing JSON and computing derivatives.
For this report, we use pre-computed `submovement_count` from Section 10.
Advanced trajectory processing (velocity profiles, jerk) can be implemented
using the `analysis/r/process_trajectory.R` script for detailed analyses.
- N trials with trajectory: 15309
- Participants: 71
**Current Analysis:** Using pre-computed metrics from Section 10.
**Future Enhancement:** Full trajectory processing available in `analysis/r/process_trajectory.R`
Submovement Count (Control Loop Efficiency)
Submovement Count by Condition (N = 71 participants, using submovement_count_recomputed)
| hand |
static |
0 |
63 |
1715 |
0.00 |
0.00 |
0 |
| hand |
static |
1 |
65 |
1748 |
0.00 |
0.00 |
0 |
| hand |
adaptive |
0 |
64 |
1738 |
0.00 |
0.00 |
0 |
| hand |
adaptive |
1 |
65 |
1758 |
0.00 |
0.00 |
0 |
| gaze |
static |
0 |
69 |
1545 |
8.51 |
4.76 |
8 |
| gaze |
static |
1 |
68 |
1501 |
8.68 |
4.62 |
8 |
| gaze |
adaptive |
0 |
70 |
1572 |
8.85 |
5.26 |
8 |
| gaze |
adaptive |
1 |
68 |
1540 |
9.06 |
5.04 |
8 |
Interpretation: Lower submovement counts indicate smoother, more ballistic movements. Adaptive UI is expected to reduce corrective submovements by expanding targets.
ℹ **Note:** Hand modality shows zero submovements (smooth movements). Plot shows gaze modality only.
Statistical Model: Submovement Count
### Model: Submovement Count
**Note:** Hand modality shows zero submovements (smooth, ballistic movements).
Gaze modality shows 8.8 submovements on average.
Modeling gaze-only data to test UI mode and pressure effects.
**Model:** log(submovement_count + 1) ~ ui_mode * pressure + (1 | pid) [Gaze modality only]
**Data Summary:** 71 participants, 6158 trials (gaze only).
**Rationale:** Hand modality shows zero submovements (smooth movements), so analysis focuses on gaze where submovements are present.
#### ANOVA Table
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ui_mode 0.37081 0.37081 1 6089.6 3.2036 0.07352 .
pressure 0.73574 0.73574 1 6087.8 6.3564 0.01172 *
ui_mode:pressure 0.07272 0.07272 1 6093.6 0.6283 0.42801
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#### Written Results (APA Style)
**Modality Comparison (Descriptive):** Hand input produced zero submovements across all conditions, indicating smooth, ballistic movements. Gaze input produced 8.8 submovements on average (SD = 4.9 ), consistent with intermittent control due to lag and saccadic blindness.
**UI Mode Effect (Gaze):** A linear mixed-effects model on log-transformed submovement count for gaze modality revealed a non-significant main effect of UI mode, F(1, 6089.6) = 3.20, p = 0.074, η²p = 0.001 (negligible effect).
Adaptive UI did not reduce submovements (M = 7.72) compared to Static UI (M = 7.58) in gaze modality.
**Pressure Effect (Gaze):** The main effect of pressure on submovement count was significant, F(1, 6087.8) = 6.36, p = 0.012, η²p = 0.001 (negligible effect).
**UI Mode × Pressure Interaction:** The interaction was non-significant, F(1, 6093.6) = 0.63, p = 0.428, η²p = 0.000 (negligible effect).
Implementation Notes: - Basic submovement analysis is already in Section 10 (Movement Quality Metrics) - Trajectory data is now available in the trajectory column (JSON string, logged at ~60fps) - Current submovement_count is pre-calculated in FittsTask.tsx using velocity peaks - Power: N=48 sufficient for main effects (dz≈0.41, power≈0.80); interactions underpowered (treat as exploratory) - Key considerations: - Use duration-normalized smoothness metrics (jerk is duration-sensitive) - Control for multiple comparisons (FDR) if testing many kinematic features - Pre-specify a small set of theoretically motivated outcomes - 60fps improves measurement precision but doesn’t increase effective N - See POWER_ANALYSIS_EXPERT_RESPONSE.md for detailed power analysis and recommendations
Potential Issues to Check: - Verify that submovement_count calculation in FittsTask.tsx matches the Optimized Submovement Model definition - Check if velocity profile data is needed or if pre-calculated counts are sufficient - Ensure submovement detection algorithm handles both hand and gaze modalities correctly
19. Summary & Conclusions
Key Findings Summary
Summary of Key Metrics by Condition (N=81)
| hand |
static |
Effective Width (px) |
33.400 |
20.600 |
| hand |
adaptive |
Effective Width (px) |
34.110 |
21.200 |
| gaze |
static |
Effective Width (px) |
35.780 |
19.670 |
| gaze |
adaptive |
Effective Width (px) |
35.580 |
19.510 |
| hand |
static |
Error Rate (%) |
1.710 |
12.960 |
| hand |
adaptive |
Error Rate (%) |
1.740 |
13.080 |
| gaze |
static |
Error Rate (%) |
19.470 |
39.600 |
| gaze |
adaptive |
Error Rate (%) |
18.170 |
38.560 |
| hand |
static |
Movement Time (s) |
1.096 |
0.418 |
| hand |
adaptive |
Movement Time (s) |
1.090 |
0.406 |
| gaze |
static |
Movement Time (s) |
1.176 |
0.482 |
| gaze |
adaptive |
Movement Time (s) |
1.239 |
0.552 |
| hand |
static |
Throughput (bits/s) |
3.550 |
0.950 |
| hand |
adaptive |
Throughput (bits/s) |
3.530 |
0.950 |
| gaze |
static |
Throughput (bits/s) |
3.220 |
1.070 |
| gaze |
adaptive |
Throughput (bits/s) |
3.140 |
1.100 |
Data Quality Notes
- Participants: 81
- Valid Trials: 14953 (out of 17442 total experimental trials)
- Exclusion Rate: 14% (due to errors, timeouts, or invalid RTs)
- Trials per Participant: Mean = 184.6, Range = 82 - 406
Target Sample: N=64 participants for enhanced power in advanced analyses (LBA, control-theory kinematics).